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To G. Henkin on his 65-th birthday 
Abstract 

We address in this paper the following two closely related problems: 
1. How to represent functions with singularities (up to a prescribed accu- 
racy) in a compact way? 2. How to reconstruct such functions from a small 
number of measurements? The stress is on a comparison of linear and non- 
linear approaches. As a model case we use piecewise-constant functions on 
[0, 1], in particular, the Heaviside jump function Tit = X[o,t]- Considered as 
a curve in the Hilbert space L 2 ([0, 1]) it is completely characterized by the 
fact that any two its disjoint chords are orthogonal. We reinterpret this fact 
in a context of step-functions in one or two variables. 

Next we study the limitations on representability and reconstruction of 
piecewise-constant functions by linear and semi-linear methods. Our main 
tools in this problem are Kolmogorov's n-width and e-entropy, as well as 
Temlyakov's (N, m)-width. 

On the positive side, we show that a very accurate non-linear reconstruc- 
tion is possible. It goes through a solution of certain specific non-linear 
systems of algebraic equations. We discuss the form of these systems and 
methods of their solution, stressing their relation to Moment Theory and 
Complex Analysis. 

Finally, we informally discuss two problems in Computer Imaging which 
are parallel to the problems 1 and 2 above: compression of still images 
and video-sequences on one side, and image reconstruction from indirect 
measurement (for example, in Computer Tomography), on the other. 
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Linear problems are all linear alike; every non-linear problem 
is non-linear in its own way. 

M. Livshitz 



1. Introduction 

In this paper we discuss the following two basic problems: 

1. How to represent functions with singularities (up to a prescribed 
accuracy) in a compact way? 

2. How to reconstruct such functions from a small number of mea- 
surements? 

We consider both the problems mainly from the point of view of a 
comparison between linear and non-linear approaches. 

We study in detail a model case of piecewise-constant functions on 
[0, 1], which, as we believe, reflects many important issues of a general 
situation. Considered as curves (or surfaces of higher dimension) in the 
Hilbert space L 2 ([0, 1]) the families of piecewise-constant functions with 
variable jump-points form a nice geometric object: so called "crinkled 
arks". They are characterized by the fact that any two their disjoint 
chords are orthogonal. A remarkable classical fact is that any two such 
curves are isometric, up to a scale factor. We reinterpret this fact in a 
context of step-functions in one or two variables. 

Next we study the problem of representability of piecewise-constant 
functions by linear and semi-linear methods. Our main tools in this 
problem are Kolmogorov's n-width and e-entropy (J33J [58]), as well as 
Temlyakov's (N, m)-width ([56J). See also ( [HI SU El] ) and references 
there for similar estimates. 

Then we turn to the reconstruction problem. We start with a neg- 
ative result: based on our computation of Kolmogorov's n-width of 
piecewise-constant functions, we provide limitations on the accuracy of 
linear methods of reconstruction of such functions from measurements. 

On the contrary, we show, following [HI [13], [131 M, US M, US 
[19] . [311 1321 [51] [52] . that a very accurate non-linear reconstruction 
is possible. It goes through a solution of certain specific non-linear 
systems of algebraic equations. We discuss a typical form of these 
systems and certain approaches to their solution, stressing the relations 
with Moment Theory and Complex Analysis. See also [5U] , RED] EE] where 
a similar approach is presented from a quite different point of view. 

We believe that the key to a successful application of the "algebraic 
reconstruction methods" presented in this paper to real problems in 
Signal Processing lies in a "model-based" representation of signals and 
especially of images. This is a very important and difficult problem 
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by itself (see [381 SHI EH HI [20] and references there). In the last sec- 
tion we informally discuss this problem together with two other closely 
related problems in Computer Imaging (which are parallel to the prob- 
lems 1 and 2 above): compression of still images and video-sequences 
on one side, and image reconstruction from indirect measurement (for 
example, in Computer Tomography), on the other. 

Our main conclusions follows: 

1. If we insist on approximating all the family of the piecewise- 
constant functions, with variable positions of jumps, by the same lin- 
ear subspace (Kolmogorov n-width) then the Fourier expansion is es- 
sentially optimal. Any other linear method will provide roughly the 
same performance: with n terms linear combinations we get an ap- 
proximation of order This concerns both the "compression" and 
the "reconstruction from measurements" problems. 

2. If for each individual piecewise-constant function we are allowed 
to take its own "small" linear combination of elements of a certain 
fixed "large" basis ( "sparse approximations" ) then with n terms linear 
combination we get an approximation of order q n , q < 1. 

3. The "non-linear width" approach (Temlyakov's (N, m)-width) 
provides a natural interpolation between the Fourier expansion, the 
sparse approximations and the direct non-linear representation. 

4. The "naive" direct non-linear representation of piecewise-constant 
functions, where we explicitly memorize the positions of the jumps 
< Xi < 1 , i = 1, . . . , N, and the values A* of the function between 
the jumps, provides the best possible compression (not a big surprise!). 
However, these parameters can be reconstructed from a small number of 
measurements (Fourier coefficients) in a robust way, via solving non- 
linear systems of algebraic equations. 

5. Extended to piecewise-polynomials, and combined with a polyno- 
mial approximation, the last result provides an approach to an impor- 
tant and intensively studied problem of a noise-resistant reconstruction 
of piecewise-smooth functions from their Fourier data. 

Let us stress that the problem of an efficient reconstruction of "sim- 
ple" ( "compressible" ) functions from a small number of measurements 
has been recently addressed in a very convincing way in the "com- 
pressed sensing" , "compressive sampling" , and "greedy approximation" 
approaches (see [8j [9j [101 [161 EL71 El E3 EZ] an d references there) . Our 
approach is different, but some important similarities can be found 
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via the notion of "semi-algebraic complexity" ([601 EI])- We plan to 
present some results in this direction separately. 

The third author would like to thank G. Henkin for very inspiring 
discussions of some topics related to this paper. Both the complexity 
of approximations and the moment inversion problem intersect with 
Henkin's fields of interest, and we hope that some of his results (see 
especially [15j [29J [30] ) ma y turn out to be directly relevant to the non- 
linear representation and reconstruction problems discussed here. 

2. Families of piecewise-constant functions in L 2 ([0,1]) 

In this paper we mostly concentrate on one specific case of a piecewise- 
constant functions, namely, on the family of step (or Heaviside) func- 
tions H t (x) defined on [0, 1] by H t (x) — 1, x < t and H t (x) = 0, x > t. 
All the results in Section 2 below remain valid (with minor modifica- 
tions) for any family of piecewise-constant functions on [0, 1] with a 
fixed number N of variable jumps. 

A remarkable geometric fact about the curve 7i = {H t (x), t 6 
[0,1]} C L 2 ([0, 1]) is that any two its disjoint chords are orthogonal. 
So the curve Ti changes instantly its direction at each of its points: it 
is as "non-straight" as possible. Such curves are called "crinkled arks" 
and we study them in more detail in Section 2.1. 

Notice that a general family of piecewise-constant functions on [0, 1] 
with a fixed number N of variable jumps forms what can be called a 
"crinkled higher-dimensional surface" in L 2 ([0, 1]), at least with respect 
to the jump coordinates: any two chords from the same point, corre- 
sponding to the jumps shifts in opposite directions, are orthogonal. 

2.1. "Crinkled arcs". As above, we define the curve 7i : [0,1] — ► 
L 2 ([0, 1]) by 7i t = X[o,t]- This curve is continuous, and it has the follow- 
ing geometric property: any two disjoint chords of it are orthogonal in 
L 2 ([0, 1]). Indeed, such chords are given by the characteristic functions 
of two non-intersecting intervals. Intuitively, the curve 7i exhibits a 
"very non-linear" behavior: its direction in L 2 ([0, 1]) rapidly changes. 
Now let X be a general Hilbert space. 

Definition 2.1. A curve ip : [0, 1] — ► X in a Hilbert space X is called 
a crinkled arc if : 

• it is continuous 

• any two disjoint chords of it are orthogonal, namely that for 
0<s<t<s'<t'<l we have: 



(2.1) 



(4> t - 4> s ,4>t' - Vv) = o 
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More details are given in the classical book of Halmos [28]. See, in 
particular, [2E1 problems 5-6] . The curve TL provides the main example 
of a crinkled arc. 

Crinkled curves are preserved by certain natural transformations. 
Namely, one can perform 

• translation 

• scaling 

• reparametrization 

• application of a unitary operator. 

Then the result would still be a crinkled arc. A simple and surprising 
theorem is that these are the only possibilities to obtain a crinkled arc, 
and any two arcs are connected by this transformations: 

Theorem 2.2. Let tp : [0, 1] — ► X x and <p : [0, 1] — ► X 2 be two 

crinkled arcs in two different Hilbert spaces. Then there are two vectors 
Vi G Xi, i = 1,2, a reparametrization f : [0,1] — ► [0,1], a positive 
number a and a (partial) isometry^ U : X\ — > X 2 . of the Hilbert 
spaces s.t. 

(2.2) U(i) f (t) - v 2 ) = a(j)t- vx 

Proof: See [281 P-169] 

Corollary 2.3. Let ip : [0, 1] — > X be a crinkled arc. Then it can 
be obtained from TC by a translation, scaling, reparametrization, and 
an application of a unitary operator between the appropriate Hilbert 
(sub-) spaces. 

Therefore, if we consider only geometric properties of curves inside 
the Hilbert space then the curve TL can be taken as a model for any 
crinkled ark. 

While any two Hilbert spaces are isomorphic, their "functional" re- 
alizations may be quite different. Consider, for example, the space 
L 2 (Q 2 ) of the square integrable functions on the unit two-dimensional 
cell Q 2 = [0, 1] x [0, 1] (we shall later refer to such functions as "im- 
ages" ) . 

The two families of functions, which are shown in Figure [TJ clearly 
represent crinkled arks in L 2 (Q 2 ). Indeed, their disjoint chords are 
given by the characteristic functions of certain concentric non-intersecting 
domains in Q 2 , and hence they are orthogonal. By Corollary 2.3, 
each of these curves is isomorphic to the curve TL in L 2 ([0, 1]). Let 

1 a partial isometry U : X\ — ► X<x between Hilbert spaces is an isometry between 
KerU 1 - and ImU 
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us state a general proposition in this direction. Consider a family 
D t C Q n , t e [0, 1], of "expanding domains" in the n-dimensional cell 
Q n = [0, 1]™, D tl C D t2 for any t 1 < t 2 . Consider the curve S(t) in 
L\Q n ) defined by S(t) = X o t e L 2 (Q n ). 

Proposition 2.4. S(t) is a crinkled curve. 

Proof: Any two disjoint chords of the curve S are given by the charac- 
teristic functions of certain concentric non-intersecting domains in Q n , 
and hence they are orthogonal in L 2 (Q n ). 

By Corollary 2.3, each of the curves S obtained as above, is isomor- 
phic to the curve H in L 2 ([0, 1]). 




Figure 1. Two families of functions in L 2 (Q 2 ) that have 
the same Hilbert space geometric properties as H C 

L\Q V ) 

If the domains evolve in time in a more complicated way (in par- 
ticular, their boundaries are deformed in a non-rigid manner), then 
the corresponding curve formed in L 2 (Q n ) may be not exactly a crin- 
kled arc. However, the following proposition shows that typically such 
trajectories look like crinkled arcs "in a small scale". 

Proposition 2.5. Let C t , t e [0,1], be a generic smooth family of 
closed non-intersecting curves in Q 2 . Consider a corresponding curve 
C t C C Then the angle between any two disjoint chords of C t tends to 
| as these chords tend to the same point. 
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Proof: Let us assume that the curves Ct(r) are parametrized by 
t G [0,1], C t (0) = C t (l). Because of the genericity assumption we 
can assume that for each t G [0,1] the derivative has a finite 

number of zeroes T\ , . . . , r m and it preserves its sign between these ze- 
roes. Therefore, the chords of C t are the characteristic functions of 
the domains as shown on Figure [2j Specifically, the intersections of 
these domains are concentrated near the zeroes 7i,...,T m of ■ 
Clearly, the area of the possible overlapping parts of these domains is 
of a smaller order than the area of the domains themselves. 




Figure 2. Two chords of the family C t 



2.2. e-Entropy of 7i. From now on we compute the e-entropy, the lin- 
ear and non-linear width only for the curve 7i in the space L 2 ([0, 1]). All 
these quantities depend only on the curve, and not on its parametriza- 
tion, and they are preserved by the isometries of the ambient Hilbert 
spaces. To exclude the influence of the scalar rescaling we can normal- 
ize our curves, for example, assuming that the distance between the 
end-points is one. Then by Corollary 2.3 the e-entropy, the linear and 
non-linear width are exactly the same for each crinkled curve. 

Let us remind now a general definition of e-entropy. Let A C X be 
a relatively compact subset in a metric space X. 

Definition 2.6. Fort > the covering number M(e, A) is the minimal 
number of closed e-balls in X covering A. The binary logarithm of the 
covering number, H(e,A) = logM(e,^4) is called the e-entropy of A. 



7 



See [33j G2] and many other publications for computation of e-entropy 
in many important examples. Intuitively, e-entropy of a set A is the 
minimal number of bits we need to memorize a specific element of 
this set with the accuracy e. Thus it provides a lower bound for the 
"compression" of A, independently of the specific compression method 
chosen. 

Proposition 2.7. For the curve 7i in the space L 2 ([0, 1]) we have 

(2.1) M(e,H)x(i) 2 , ff(e,W)~21og(±). 

e e 

Here the sign x is used as an equivalent to the inequality 
C^-) 2 < M(e,H) < C^) 2 

for certain C\ and C 2 , and for all sufficiently small e. The sign ~ 
shows that C\ and C 2 tend to 1 as e tends to zero. 

Proof: Let us subdivide uniformly the interval [0, 1] into N segments 
Aj by the points tj = jr. We have 

\\n(t i+1 )-nm = (£ +1 dt)^ = (^. 

Hence for e = the e-balls covering different points H(U), i = 

1, . . . , N of the curve H do not intersect. Thus, we need at least N such 
e-balls to cover TC, while the 2e-balls centered at the points 7^(tj), i = 
1, . . . , N cover the entire curve 7i. This completes the proof. 

2.3. Kolmogorov's n-width of 7i. Let A C V be a centrally-symmetric 
set in a Banach space V. 

Definition 2.8. fl39l The Kolmogorov's n-width W n (A) of the 

set A C V is defined as 

(2.2) W n {A)= inf sup dist{x , L) , 

dimL=n X (zA 

where the infinum is taken over all the n-dimensional linear subspaces 
L ofV, and dist(x,L) denotes the distance of the point x to L. 

Intuitively, W n (A) is the best possible approximation of A by n- 
dimensional linear subspaces of V. Let us define also N(e, A) as the 
minimal n for which W n (A) < e. 

To make the Kolmogorov n-width comparable with the e-entropy, 
we define the notion of a linear e-entropy of A, which is the number 
of bits we need to memorize A with the accuracy e, if we insist on a 
linear approximation of A (and if we "naively" memorize each of the 
coefficients in this linear approximation): 



Definition 2.9. A linear e- entropy of A, Hi(e,A), is defined by 

(2.3) H l (e,A) = N(e,A)\og(- e ). 

Now we state the main result of this section: 
Theorem 2.10. For the curve H in L 2 [0, 1] we have 

t^= < W n (H) < —^=, N(e,H) x (i) 2 , HfaK) x (-) 2 log(-). 
4^/n nyn — 1 e e e 

Proof: It is enough to prove the bounds for the n-width of H. The 
corresponding bound for N(e,7i) and Hi(e,H) follow immediately. 

Now, the upper bound for the n-width we obtain, considering the 
Fourier series approximation of the Heaviside functions H t (x). 



(2.4) H t (x) = J> fe e 



2nikx 



Then a = t and a n = x ~\^ nt for n ^ 0. We have \a n \ < Hence 
the L 2 error f n of the approximation of any H t by the first 2n+ 1 terms 
of its Fourier series satisfies 

(2-5) /»<[£ -^Y<^r- 

m=n+l 

And therefore 

2 



(2.6) W 2n+1 (H) </„=*► W n (H) < 



T{\Jn — 1 

The proof of the lower bound we split into several steps. 

Lemma 2.11. For a set A k = {e^i, e,) = Sij, 1 < i < k} C L 2 [0, 1] 
and n < k the following inequality holds 



Proof: Denote W = span{ej} and P w : L 2 [0, 1] — > L 2 [0, 1] the orthog- 
onal projection on W. 

We take an n-dimensional subspace V. We can assume that V C W 
This is because for v E V, a E A k we have: 

\\a-v\\ 2 = \\P w (a-v)\\ 2 + \\(I-P w )(a-v)\\ 2 = 

= \\a-P w v\\ 2 + \\(I-P w )v\\ 2 > \\a-P w v\\ 2 . 



Therefore dist(Afc, V) > dist(Afc, PwV), and in order to minimize the 
distance we can assume V C W. Denote Py : W — ► W the orthogonal 
projection on V in W. We need to compute maxi<j<*. ||(J — Py)ej||. 
But 



(2.7) max \\(I-P v )e i \\ > 

Ki<k 



On the other hand, 
(2.8) 



k 

x ~£ii('-a^ 
\ 1=1 



£ ||(/ - JVteH 2 = ]£((!- P ^ e - ( J - P v-)e.) = £((/- Pvfe^ e, 

i=l i=l j=l 

k 

= ((I — Pv)ei, £i) = tracevy(/ — Py) = k — n. 
i=i 

The last equality is because J — Py : — > is a projection into a 
(fc — n)-subspace - the orthogonal complement of V in W. Combining 
equations (I2.7p . fl2.8p we have 



tist(A k ,V) > 
Corollary 2.12. For any d eR, 



W n (dA k )>\d\^^. 
Proposition 2.13. 

w '<#> £ 4^T 

Proof: Denote 1?^ = {y/»-i i\|l < i < A;}. This set is formed by 

k orthogonal vectors of length -4=. Clearly VYi-i ±\ = TC± — Tii-x, 

vfe y k , k ) k k 

therefore 

dist(B k ,V) < 2dist(W, V) 
for any vector space V, and thus: 

W n (B k ) < 2W n (H). 
The norm of y/i-i i\ is and according to Corollary 12.121 we have 
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Taking k = 2n provides the required result. This completes the proof 
of Proposition 12.131 and of Theorem 12.101 

2.4. Sparse representation of a step-function. Our main exam- 
ple of the family TC of the step-functions H t (x) allows us to illustrate 
also some important features of "sparse representations" . Consider the 
Haar frame: 

HF = [<f> kJ (x) = 2 k ' 2 ^2 k {x- 3 ))\k e {0, 1, 2, . . . }, j e {0, 1,2,..., 2 fc -l}} 

where = X[o,i\- To get an approximation of a certain fixed step- 
function H to (x) consider the binary representation of t : 

oo 

— , a r = 0, 1. 

r=l 

Then for each n the sum 

1 r_1 a 

(2-9) Jr = ¥ J2^ 

r<n s=l 

leads to the approximation of H to (x) in the Haar frame with the L 2 - 
error at most (^) !L 2~ . Indeed, the sum in (6.1) is, in fact, a step- function 
H tl , with ti < t and t - t x < (|) n+1 (see Figure EJ. 

So to e-approximate each individual step-function H to via the Haar 
frame in the L 2 -norm, we need only 2 log(^) nonzero terms in the linear 
combination. This provides a natural example of a "sparse representa- 
tion" . 

Notice, however, that if we fix the required approximation accuracy 
e = and then let the jump point t of H t change, then the ele- 

ments of the Haar frame, participating in the representation of differ- 
ent H t (x), eventually cover all the 2 n binary step-functions of the n-th 
scale. So altogether, to approximate the entire curve H C L 2 ([0,1]), 
we need the space of the dimension 2 n = (^-) 2 . This agrees with the 
value of W n (7i) computed above. 

2.5. n-term representation. In order to quantify the "sparsness" of 
different representations (and, in particular, to include the previous 
example in a more general framework) we call (following [T5J chapter 
8]) a countable collection D of vectors in a Banach space a dictionary, 
and define the error of the n-term approximation of a single function 
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to = 0.10101. . 2 



1 i J_ / 

2 8 32 



Figure 3. Approximation by Haar frame of Xt 

/by: 



(2.10) <t„(/,D)= inf ||y - 

i=l 



n 

aiwA 



We use three different dictionaries for ^[0, 1]: Fourier basis: 

FB = {e ikx \k G Z}, 

Haar frame: 

HF = [<f> kJ = 2 k / 2 4>(2\x- 3 ))\k G {0, 1, 2, . . . }, j G {0, 1,2,... 2 fe -l}}, 
and Haar basis: 

HB = {Vw = 2 fc /^(2 fe (x-i))|A; G {0, 1, 2, . . . }, j G {0, 1,2,... 2 fe -l}}u{0}, 
where = X[o,i] and ip = X[o,i/2] - X[i/2,i]- 

Clearly, 

a„(^,FB) 



n l/2> 

which means, that the best n-term approximation in this case is the 
same as the usual linear Fourier approximation. Also, we have 

a n (^,HF)<C2"f, 
<7 B (fr t ,HB)<C2-5. 
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Remark: It is customary in the Approximation Theory to demand that 
n-term approximation will be "computable"- so that it has polynomial- 
depth search. This means that we can enumerate our dictionary with 
a fixed enumeration D = {/i, f 2 , ...} in such a way that for a certain 
polynomial p : N — > N the n-terms of approximation come from the 
first p(n) terms of the dictionary, see [9]. Clearly, if we consider HF 
and HB we will need to take each function from a different level of 
Haar basis/frame, and therefore our search will have an exponential 
depth. 

2.6. Temlyakov's non-linear width. The following notion of a "non- 
linear width" was introduced in [55] : 

Definition 2.14. Let A be a symmetric subset in a Banach space X . 
Then the (N,m) width Wm m )(A) is defined as 

W(N m \(A) = inf sup inf dist(f,L), 

C N CC(X) m , \C N \=N feA LeC N 

where C(X) m denotes the collection of all the linear m- dimensional 
subspaces of X . 

The approximation procedure, suggested by this notion, is as follows: 
given iV and m, we fix (in an optimal way) a subset Cn of iV different 
m-dimensional linear subspaces L\, . . . , Ln in X. Then for each specific 
function / G X we first pick the most suitable subspace in C^, and 
then find the best linear approximation of / by the elements of Lj. 

The notion of a nonlinear width provides a "bridge" between the lin- 
ear approximation and the approximation based on "geometric mod- 
els" . Indeed, ultimately the set may be just the set formed by all the 
piecewise-constant functions (in our main example), for all the values 
of the parameters, discrtetized with the required accuracy. See Section 
3 below where we analyze in somewhat more detail this "bridging" for 
the curve TC. 

The set Cn suggests a covering of the A by n sets 

Vi = {g\dist(g, U) < dist(#, L k ); Li, L k E C N }. 

Namely, the set Vi contains the elements of A, that are best approxi- 
mated by the subspace Li from the collection Cn- In the next lemma, 
we prove that we can replace Vi by open sets. 

Lemma 2.15. Let On denote the set of all the open covers U = 
{U\, . . . , Un} of A of cardinality N. Then 

W (N , m) (A)= inf sup W m (Ui). 
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In other words, we subdivide A into N open sets and check m-width 
on each of the sets separately. Then the maximum m-width over N 
sets is the (N, m)-width of A. 

Proof: Denote by 

d=W {N , m) (A), 
the left-hand side of the equation and by 

e = inf swpW m (Ui), 
ueo,#u=N v . m 

it's right-hand side. For e > 0, we interptret the definition of d as 
existence of a collection of N m-dimensional subspaces of X such 
that 

inf inf 11/ -oil < d+e V/ G A 

LidC N ,i=l..N g€Li 

Define U = {g\g G A, \\g — L^\\ < d+e}. Clearly, Ui are open in A since 
the distance is a continuous function. According to the definition of Ui, 
Li approximates Ui with accuracy d + e and therefore W m {Ui) < d+ e. 
We conclude that 

(2.11) e < swpW m (Ui) < d + e. 

i 
N 

In the other direction, let [J U = A such that W m (Ui) < e + e. For 

i=i 

each i we find Lj an m-dimensional subspace s.t. 

dist(g, Li) < e + 2e, Wg G U, 
Then form C N = {L 1; .., L N }. Clearly, 

sup inf dist(/, L) < e + 2e 

feA ie^iv 

and therefore 

(2.12) d<e + 2e. 

Taking e — > in the inequalities (12.111) . f)2.12p . we get the required 
equality. 



In what follows, we take X = L 2 ([0, 1]) 
Proposition 2.16. 

W {N>m) (H) 



14 

Proof: 

To establish an upper bound, define 

Lfc = spanjvrfc^ , _n_ k-i , n+ii : < n < m — 1}, k = 1..N 

Each L fc approximates W Lk_i *, within an error of J^j— - Therefore 

W(N, m ) is bounded above by an error of {Lk} k=1 . 

In order to establish the lower bound, we prove a variant of a Lemma 

Em 

Lemma 2.17. For a set A k = {ej|(ej, ej) = Af5j.pl < i < k, Aj > 0} 
and n < k the following inequality holds 



W n (A k ) > \l— — minAi 



The difference with Lemma 12.111 is that we allow orthogonal vectors 
with varying lengths. 

Proof: Let V be a n-dimensional space and W = spanjej}. Just 
like in Lemma I2.11[ we can assume V C W . Denoting the orthogonal 
projection of V on W by P, we are required to compute max || (J— P)ei\\ . 

(2-13) 

k 

-^£ll(/-^ll 2 . 

K i=i Ai 

Since t- are orthonormal then 
(2.14) 




Combining equations 12. 131 and 12. 141 we get the required result. 

We return to the proof of the Proposition 12.161 We employ Lemma 
12.151 Let {Ui}i=i,, n be an open cover of 7i. Define 

v i = n-\u i ). 

Namely, Vi C [0, 1] contains all the fs such that T~C(t) G \J{. Vi are 
open in [0,1], since TL is continuous. The collection {Vi}i=i„ n is a 
covering of [0, 1] since {Ui} is a cover of Ti. Because Vi is open, we 
can find 2m + 1 points A& in Vi such that A^ — A^+i > me ^ y '^ — e, 
for any e > 0, where meas(Vi) denotes here the Lebesgue measure of 
Vi. We apply Lemma EU7| to B 2m = {X[A ft ,A fe+1 ] : 1 < k < 2m}. Since 



max || (I-P)ei || > 



\ 



k 

\Y^\\{I-P)e i \\*>mm\ i 



i=l 



\ 
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IIX[A fc ,A fe+ i]ll = v^fc+i - A fe > \J me ^ ] - e, the application of Lemma 
12.171 gives 

T rr x /meas(Vi) 
W m {B 2m )> 

But X [Afc ,A fc+1 ] = H Xk+1 -^-Denote 5 = {ft Afc : 1 < k < 2m + l} C U t . 
For any vector space V, we have 

dist(B 2m ,V) < 2dist(5, V), 

and therefore 



(2.15) W m (C/,) > W m (S) > \w m {B 2m ) > M Eggggj) _ e . 

2 4 V m 

AT 

But since cover [0,1] , we have ^ meas(Vi) > 1 and so 

i=l 

maxmeas(VJ) > — • 
• N 

Therefore 

(2.16) m^W m {Ui) > l r — - e, 

for any open cover of 7i. And so, according to Lemma [2.151 

(2.17) W^-' 

Thus we obtain the required lower bound after we take e — > 0. 



3. Linear versus Non-Linear Compression: some 

conclusions 

In this section we summarize the above results, interpreting them 
as the estimates of the "compression" of the family Ti, (and of other 
families of piecewise-constant functions): how many bits do we need 
to memorize an arbitrary jump-function H t in Ti with the L 2 -error at 
most e, via different representation methods ? 

3.1. e-entropy. Let us start with the e-entropy: by Proposition 2.7, 
H(e,H) ~ 21og(-). This is the lower bound on the number of bits in 
any compression method. 
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3.2. "Model-based compression". Let us consider a "non-linear 
model-based compression" which in the case of the jump-functions 
takes an extremely simple form: we use the "library model" H t (x) 
to represent itself, and we memorize just the specific value of the pa- 
rameter t. Quite expectedly, "compression" with this model requires 
exactly the number of bits prescribed by the e-entropy. Indeed, since 
the L 2 -norm of H t2 — H tl is y/t^ — t\, we have to memorize t with the 
accuracy e 2 . This requires exactly 21og(^) bits. 

3.3. "Linear" compression. Let us assume now that, given the re- 
quired accuracy e, we insist on a representation of the functions H t (x) 
in a fixed basis, the same for each t. On the other hand, we allow 
the approximating linear space to depend on e. This leads to the Kol- 
mogorov n-width, as defined in Section 2.3. We store each coefficient 
with the maximal error e, so we allow for it log(-) bits (and thus we 
ignore a very special "sparse" nature of the representation of H t (x) in 
some special bases, for instance, in the Haar frame, discussed in Sec- 
tion 2.4). Then the number of bits required is given by the "linear 
e-entropy" Hi(e,7i), introduced in Section 2.3. By Theorem 2.10, we 
have 

^(^)x(Vlcg(-). 

e e 

In fact, to get a representation with this amount of information stored, 
we do not need all the freedom provided by the definition of n-width. 
It is enough to fix the approximating space to be the space of trigono- 
metric polynomial for any required accuracy e. Then to approximate 
H t with the L 2 -accuracy e we take the Fourier polynomial of H t of 
degree n — and memorize its coefficients with the accuracy -. 

3.4. "Non-linear width" compression. In [56J a notion of a "non- 
linear (N, m)-width" has been introduced (see Section 2.6 above). It 
suggests the following procedure for approximating functions H t (x): 
given the required accuracy e, we fix a subset of N m-dimensional 
linear subspaces L l5 . . . , L N in L 2 [0, 1]. Then for each specific function 
H t (x) we first pick one of the subspaces Lj (the most suitable), then 
find the best linear approximation of H t (x) by the elements of L,, and 
finally memorize the coefficients of the best linear approximation found. 

Let us estimate the number of bits required in this approach. By 
Proposition 2.16, for the non-linear N, m-width of 7i we have 
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Given the required accuracy e, we have to fix the parameters N and m 
in such a way that J— < e. Therefore, for each choice of m between 1 

J VNm — ' 

and (-J we have to take A/" = — ( — J . To memorize the choice of the 
space Li we need then log N = 2 log(-) — log m bits. To memorize the 
coefficients we need mlog(-) bits. Hence, the total amount of bits is 

(m + 2) log(-) — log m. 

e 

Certainly, the best choice is m — 1: we just take iV = (^) 2 elements 
if ti , £j = ~, and approximate with the nearest among This 
is, essentially, the same as the "model-based" representation in Section 
3.2 above. 

3.5. "Sparse" representation. Till now the comparison was in favor 
of a model-based approach. Let us consider now the Haar frame rep- 
resentation of H t (x) considered in Section 2.4 above. This is the most 
natural competitor, both because of its theoretical efficiency, and since 
many modern practical approximation schemes are based on sparsness 
considerations (see [51 1531 157) ). 

By the computation of Section 2.4, to approximate each individual 
step-function H to via the Haar frame in the L 2 -norm, we need only 
m = 2 log(-) of the nonzero terms in the linear combination. Moreover, 
each coefficient in this linear combination is 1. So to memorize H to via 
the Haar frame it is enough to specify the position of m = 21og(-) 
nonzero elements among the total Haar frame of cardinality 2 m = (-) 2 . 
We need 

log , r l/ ., xm 2 x[log(i)] 2 
ml{2 m — m)\ e 

bits to do this. 

We get a little bit more information to store than in the "model- 
based" approach. Also, it may look not natural to approximate such 
a simple pattern as a jump of a step-function with a geometric sum 
of shrinking signals. However, the main problem is that if we let the 
jump point t of H t change, then the elements of the Haar frame, par- 
ticipating in the representation of different H t (x), jump themselves in 
a very sporadic way, and eventually cover all the 2 m binary Haar frame 
functions of the ra-th scale. 

Notice also that from the point of view of the non-linear width 
(Section 2.6 above) the considered Haar frame representation takes 
an intermediate position: here m = 21og(^). But any subspace L = 
spsm{X[t k . ,t k .+2- k i]:i=i..m} can cover on ly 7~Lt k . and their e-neighborhoods 
and therefore L covers with the accuracy e only a set of measure 
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4e 2 (log(^) + 1) out of the entire interval [0, 1] of parameters. Thus 
to cover the entire interval we will need N x ^ subspaces. We 

conclude that e x ^7=, in agreement with Proposition 12.161 The 
required number of bits is 

log (-X) +2 log (-) - log log (-) + log 2. 



e / / \ £ I \ e 



So it would be much more natural and efficient to represent a "video- 
sequence" Ti. — {H t (x), t G [0, 1]} by a moving model than to follow 
the jumping parameters in a sparse Haar representation for variable t. 
This conclusion certainly is not original. The problem is to get a full 
quality model-based geometric representation of real life images and 
video-sequences! 



4. Non-linear Fourier inversion 

Now we turn to our second main problem: how to reconstruct func- 
tions with singularities (piecewise-constant functions) from a small 
number of measurements? Let us assume that our "measurements" 
are just the scalar products of the function / to be reconstructed with 
a certain sequence of basis functions. In particular, below we assume 
our measurements to be the the Fourier coefficients of / or its mo- 
ments. This is a realistic assumption in many practical problems, like 
Computer Tomography. 

The rate of Fourier approximation of a given function and the ac- 
curacy of its reconstruction from partial Fourier data is determined by 
regularity of this function. For functions with singularities, even very 
simple, like the Heaviside function, the convergence of the Fourier se- 
ries is very slow. Hence a straightforward reconstruction of the original 
function from its partial Fourier data (i.e. forming partial sums of the 
Fourier series) in this cases is difficult. It also involves some systematic 
errors (like the so-called Gibbs effect). 

Let us show that no linear reconstruction method can do significantly 
better that the straightforward Fourier expansion. 

Theorem 4.1. Let the function acquisition process comprise taking n 
measurements (linear or non-linear) rrii(f), i — 1, . . . , n of the function 
f , together with a consequents processing P of these measurements. If 
the processing operator f = P(mi, . . . , m n ) is a linear operator from 
W 1 to L 2 ([Q, 1]) then for some f G H the error \ \f — f\ \ is at least C\-k^. 
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Proof: This follows directly from Theorem 2.10 above. Indeed, the 
n-dimensional linear subspace Im(P) cannot approximate all the func- 
tions in 7i with the error better than the Kolmogorov n-width of 7i. 

If we have no a priori information on / G L 2 ([0, 1]) then probably 
the straightforward Fourier reconstruction as above remains the best 
solution. However, in our case we know that / is a piecewise con- 
stant function. It is completely defined by the positions of its jumps 
< Xi < 1, i — 1, . . . , N and by its values Ai between the jumps. So 
let us consider Xi and Ai as unknowns and let us substitute these un- 
knowns into the integral expression for the Fourier coefficients. We get 
certain analytic expressions in Xj and A^ Equating these expressions 
to the measured values of the corresponding Fourier coefficients we get 
a system of nonlinear equations on the unknowns Xi and Aj,. Let us 
write down this system explicitly. 

4.1. Fourier inversion system. Let f(x) = Yl°^oo c k& 2mkx be the 
Fourier expansion of /. Here Ck = ^\ Jo 1 f(f) e ~ 2mkt dt. Taking into 
account a special form of / as given above we obtain = ^ik~[~ A) + 
- Ai)e- 2nikx * + A N e~ 2mk }. Here A is the value of / on the 
leftmost continuity interval. Denoting —2nikck by c& and e~ 2mXi by Zi, 
we finally get the following infinite system 

N 

(4.1) A) + ^(Ai - A^)z k - A N e~ 2mk = c k , k G Z. 

i=i 

The unknowns in system (4.1) are Aj, j = 0, . . . , N which enter this 
system in a linear way, and z^, i — 1, . . . , N, entering it non-linearly. 

System (4.1) classically appears in Pade Approximation. Very simi- 
lar systems appear in a reconstruction of plane polygonal domains from 
their moments ( SH SSI 123 US US] ) • A detailed investigation of a 
larger class of systems similar to (4.1) is given in [3TJ[32]. In particular, 
we have the following result: 

Theorem 4.2. Assume that c k in the right-hand side of (4-1) are 
Fourier coefficients of a piecewise- constant function f with Ai ^ A i+ i, i = 
0, . . . , N. Then each subsystem of (2.1) obtained by taking from it cer- 
tain 2N + 1 subsequent equations has a unique solution {Aj, j = 
0, ...,N}, {zi, i = 1,...,N}, with Xi = 3^-7 log ^ being the jump 
points of f and Aj being the values of f on its continuity interval. 

We give a sketch of the proof, following [31], [32] , in Section 4.2 below. 
Thus solving an appropriate subsystem of system (4.1) we find the 
jumps and the intermediate values of /, so we reconstruct / exactly. If 
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/ had N jumps we need only 2N + 1 Fourier coefficients to reconstruct 
it. 



4.2. Other examples of the inversion systems. Let us start with 
another system which essentially coincides with (4.1). To simplify the 
presentation we shall consider instead of the Fourier coefficients of the 
function g(x), x e [0, 1] the moments m k (g) = x k g(x)dx. 

4.2.1. Linear combination of 5 -functions. Let g(x) = Y% =1 Ai5{x — Xi). 
For this function we have 

(4.2) m k (g)= [ x k ^ = iAS(x - Xi )dx = £? =1 4z?- 

Jo 

So assuming that we know the moments m k {g) = a k , k = 1, . . . , 2n — 1, 
we obtain the following non-linear system of equations for the param- 
eters Ai and x«, i — 1, . . . , n, of the function g: 



^ =1 Ai = a , 
YT i=1 AiXi = ai, 



(4.3) ^tl^i = «2n-L 

This system can be solved as follows: consider the moments generating 
function 



)z k 



k=0 

The representation (4.2) of the moments immediately implies that 



(4.4) I(z) = S™ =1 



1 — ZX; 



So it remains to find explicitly the rational function I(z) from the first 
2n its Taylor coefficients ao, . . . , ot2 n -i- 

To do this we remind that the Taylor coefficients of a rational func- 
tion satisfy a linear recurrence relation of the form 

(4.5) m r+n = EpJCj-nv+j, r = 0, 1, . . . . 
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Since we know the first 2n Taylor coefficients eno, • • • , oc2 n -i, we can 
write a lin ear system on the unknown recursion coefficients Cf. 







^ ni 




=0 CjQtj + i = 











(4.6) E" J'j'^j-v = «2n-l- 

Solving linear system (4.6) with respect to the recurrence coefficients Cj 
we find them explicitly. For a solvability of (4.6) see [4"6"1 l3l~| 132 1 I5T | 152] . 
See also 031 S3 [261 [B]. 

Now the recurrence relation (4.5) with known coefficients G\ and 
known initial moments allows us to easily reconstruct the generating 
function l{z) and hence to solve (4.3). 

4.2.2. Algebraic functions. Let now g(x) be an algebraic function on 
[0, 1]. By definition, y = g(x) satisfies an equation 

(4.7) a n (x)y n + a^^)^' 1 + • ■ ■ + a 1 {x)y + a (x) = 0, 

where a n (x), . . . , ao(x) are polynomials in x of degree m. d = m + n is, 
by definition, the degree of g. 

A general method for the non-linear inversion of the moment (Fourier) 
transforms of algebraic functions is given in [32]. Its "quantitative form 
is given in [52]. Here we analyze only one special case. Assume that the 
algebraic curve y = g(x) is a rational one. This means that it allows 
for a rational parametrization 

(4.8) x = P(t), y = Q(t). 

The moments rrik(g) given by rrik(g) = j Q x k g(x)dx, k = 0, 1, . . . , now 
can be expressed as 

(4.9) m k (g)= [ P k (t)Q(t)p(t)dt, 

Jo 

where p denotes the derivative P' of P. Moments of this form naturally 
appear in a relation with some classical problems in Qualitative Theory 

of ODE's -see [51 E|, BEl EE]. 

Our problem can be reformulated now as the problem of explicitly 
finding P and Q from knowing a certain number of the moments rrik 
in (4.9). Of course, in general we cannot expect this system of non- 
linear equations to have a unique solution. Indeed, while the function 
y = g(x) is determined by its moments in a unique way, the rational 
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parametrization of this curve in general is not unique. In particular, let 
W{t) be a rational function satisfying W(0) = 0, W(l) = 1. Substitut- 
ing W(t) into P and Q we get another rational parametrization of our 
curve: x = P{t), y = Q(t) with P(t) = P{W{t)), Q(t) = Q{W{t)). 
Consequently, the "inversion problem" for system (4.9) is: 

To characterize all the solutions of system (4-9) and to provide an 
effective way to find these solutions. 

A special case of the inversion problem is the "Moment vanishing 
problem" : 

To characterize all the pairs P, Q for which the moments defined 
by (4-9) vanish. 

In spite of a very classical setting (we ask for conditions of orthog- 
onality of Q to all the powers of P\) this problem has been solved 
only very recently ([48J). It plays a central role in study of the cen- 
ter conditions for the Abel differential equation (see [5j [6], [H [TTJ [62] . 



4.2.3. Functions of two variables. The approach to reconstruction of 
piecewise-smooth (piecewise-polynomial) functions of one variable dis- 
cussed above can be extended to two (and more) variables. The case 
of characteristic functions of polygonal plane domains is considered in 
[141 HS1 EH EE] ■ Some initial instances of the reconstruction problem of 
piecewise-polynomial functions of two variables are considered in |51j . 
Even the most initial examples in two dimensions provide an exciting 
variety of non-linear system bringing us to the very heart of Analysis. 
Let us mention here only one example and a few of the most directly 
related references. 

We want to reconstruct a function f(x, y) of two variables from its 
moments 



Assume that / is a ^-function along a rational curve S, i.e. for any 
i/j(x, y) we have f J fif)dxdy = f s if)(x,y)dx. 



be a rational parametrization of S. The moments now can be expressed 
as 



HUSH]. 



(4.10) 




Let 




x = P(t), y = Q(t), te[0,l] 



(4.12) 
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where p denotes the derivative P' of P. The study of the double mo- 
ments of this form bring us naturally to the recent work of G. Henkin 
[T5l [29] [30] . Indeed, the vanishing condition for the moments (4.12) is 
given by Wermer's theorem (PQ): mki(f) = if and only if S bounds a 
complex 2-chain in C 2 . In general, if the moments rriki(f) do not vanish 
identically, then the local germ of complex analytic curve S generated 
by S in C 2 does not "close up" inside C 2 . G. Henkin's work ([13 [291 [30]), 
in particular, analyzes various possibilities of this sort in terms of the 
"moments generating function". We expect that a proper interpreta- 
tion of the results of [15], [29], [30] can help also in understanding of the 
moment inversion problem. 

There are many other results closely related to our problem (see 
references in [44] 145] [26l [TB] , [51] [52] . Here we mention in addition only 
[271 [50] where, in particular, the problem of a reconstruction of plane 
"quadrature domains" from their double moments is considered, and 
results on moments on Semi-Algebraic sets and positivity (see [36] [37] 
[53] and references there). 

4.3. Robustness of solutions of (2.1). Let us return to functions of 
one variable. The assumption of / being a piecewise-constant function 
may look too unrealistic in applications. However, the methods can be 
extended to piecewise-polynomial and ultimately to piecewise-smooth 
functions (see [31] [32] [51] [52]). The last class is of major importance 
in applied Analysis and Signal Processing, and the problem of a recon- 
struction of such functions from their measurements (Fourier data) is 
at present actively investigated (see [22]- [25], [49] HUGS] and references 
there). 

The key issue in the extension of the above methods to piecewise- 
smooth functions is a robustness of solutions of (4.1), (4.3), (4.9) and 
similar systems. In particular, what happens if we take more than 
exactly 2N + 1 consequent equations in (4.1), and because of the noise 
in our measurements the right hand side is not exactly a sequence of the 
Fourier coefficients of a piecewise-constant function? Some important 
results in this direction can be found in [44] 145] I2"6"l [T5] . 

We further investigate these problems in [52J. Our initial considera- 
tions show that one can define a robust procedure for solving systems 
like (4.1), (4. 3) for any right-hand side, taking more equations than 
2N + 1 and replacing the exact solution by the least-square fitting. 
Notice, however, that we apply this procedure not to the original non- 
linear system (say, (4.3)) but to a linear system (4.6) for the parameters 
Cj of the linear recurrence relation, satisfied by the Fourier coefficients 
(moments) Ck (mk) of any piecewise-constant function. 
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We expect also that taking more than the minimal number of the 
measurements, and hance of the equations in (4.1) (say, twice the mini- 
mal number) can strongly improve the robustness of the solution. This 
conclusion is supported by recent results in [6U[63] where we investigate 
similar problems for Hermite interpolation and Hermite least-square 
fitting. 

4.4. Piecewise-smooth functions. We expect that applying the above 
results to the case of piecewise-smooth functions we can get, in partic- 
ular, the following result: 

Conjecture. Let f be a piecewise C k function on [0, 1] with N dis- 
continuity points X{. Assume that the C k -norm of f on each continuity 
interval does not exceed M. Assume also that a distance \xi —Xj\ > D 
for i ^ j , and that a jump of f at each of its discontinuity points Xi 
is at least J. Then for each n > 2N + 1 the points x^ and the values 
of f between the points Xi can be reconstructed from the first n Fourier 
coefficients of f with the accuracy where the constant C depends 
only on M, N, J and D. 

We expect also that X{ and the approximating polynomials of / on 
its continuity intervals are provided by universal analytic expressions 
in Ck (see [3T1 [32]). In [52] we prove a weaker version of this result (with 
a weaker estimate of the approximation accuracy). The main steps in 
the proof are as follows: 

1. We fix an approximation accuracy e > 0. We approximate / up 
to e by a piecewise-polynomial AP of degree d on each of its continuity 
intervals. By classical Approximation Theory this can be achieved with 
d = C l {\)l. 

2. We consider the jump points Xi and all the coefficients of the 
piecewise polynomials constituting AP as the unknowns, and substi- 
tute them into a system (*) similar to (4.1), which is constructed "once 
forever" for piecewise-polynomials of degree d (see [3U [321 EH [52]). As 
the right-hand side we take the Fourier coefficients Ck of /. By the 
choice of AP we know that its Fourier coefficients &k satisfy \dk — Ck\ < e 
for any k. 

3. At this step we determine the number of the equations (i.e. of the 
Fourier coefficients of /) we need to achieve the prescribed accuracy 
e. We pick an appropriate finite subsystem (**) of (*). Then we solve 
(**) with respect to the unknown parameters of AP. By the robustness 
estimates of |52j the solution differs from the true parameters of AP 
by at most C 2 e. 
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4. We form a piecewise-polynomial AP with the parameters found 
in step 3. By the above estimates, the jump points of AP approximate 
the true jump points of / with the accuracy C 3 e while the partial poly- 
nomials of AP approximate the values of / on its continuity intervals 
with the accuracy C^e. This completes the proof. 

In [52] we provide a detailed proof. We also compare the above 
results with the classical results of Approximation Theory on one side, 
and with some recent results on linear (or semi-linear) reconstruction 
methods for piecewise-smooth functions (see [22]-[25], [HI HH El] , an< ^ 
references there). 

5. Digital images 

Our considerations in Sections 2-4 above were motivated, in partic- 
ular, by an attempt to estimate the expected efficiency of linear versus 
non-linear methods methods of acquisition and compression of still im- 
ages and video- sequences. 

Application of rigorous mathematical tools in Image Analysis is usu- 
ally difficult, because of an appeal to a "human visual perception" 
which is central in this field. For example, the main compression re- 
quirement is to preserve image's "visual quality" - the notion which is 
well known to escape any attempt of a rigorous mathematical defini- 
tion. 

Still, simple characteristics of images approximation, like L 2 -error, 
while not completely adequate to the "human visual perception" re- 
sults, are usually very instructive. In the present section we shall try to 
translate the rigorous results of Sections 2-4 about piecewise constant 
functions to the language of images. By the reasons that become clear 
below we believe that our conclusions (which we call "statements" not 
to mix with theorems) are as accurate as possible: they can be made 
rigorous by restricting accurately a set of allowed images we work with. 

5.1. Linear space of images. A typical image is represented by a 
rectangular array of pixels (say, 512 x 512). At each pixel the bright- 
ness (or the color) discretized value is stored, typically, 8 bits or 256 
brightness values, for grey-level images, and 24 bits for three-color RGB 
images. In this paper we shall ignore the discrete nature of digital im- 
ages, and consider them as bounded functions on the square Q 2 . (See 
[351 H] for the discussion of some specific problems related to the dis- 
crete nature of images). 

To make the space of images a linear one, we have to ignore another 
important feature of true images: the image brightness has always 
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to be within the prescribed interval (say, [0,255]). So we cannot add 
images as usual functions. Still, it is convenient to consider images as 
the elements of the Hilbert space C = L 2 (Q 2 ) of functions on the unit 
two-dimensional cell Q 2 . 

However, considering images as elements in the linear space £ stresses 
their non-linear nature. Let us mention some of the most immediate 
manifestations of this important fact. 

1. First of all, addition (or, more generally, forming linear combi- 
nations of images) usually produces a new brightness function, which 
is difficult to interpret as a meaningful "image". Indeed, such a sum 
will show an artificial overlapping of the objects appearing on each one 
of the original images. Only for images representing exactly the same 
scene (like, for example, the three color separations R, G, B of the same 
color picture) their linear combinations have a direct visual meaning. 

2. Secondly, only a small fraction of the standard image processing 
operations (like high-pass and low-pass filtering) are linear transfor- 
mations of the Hilbert space C. Most of the usual image processing 
operations (as represented, for example, in the Adobe's "Photoshop" 
package) take into account the visual patterns on the image. Conse- 
quently, the processing is subordinated to the geometry of the objects 
on the image, and in this way it is highly non-linear. 

3. Third, individual images depend in a highly non-linear way on 
the boundaries data of the objects. 

4. Finally, the most important time-dependent families of images - 
video-sequences - turn out to be very complicated curves in C In fact, 
as we shall see below, they behave geometrically as the "crinkled arcs" 
considered in Section 2. 

Let us consider in more details the effect of a motion of objects on 
the image: this is the main content of typical video-sequences. First, 
to simplify considerations, let us assume that the objects are perfectly 
black while the background is perfectly white. Then our images, as the 
elements of the Hilbert space C, are just the (negative) characteristic 
functions of the domains occupied by the objects on the image. 

If an object moves in such a way that the occupied domains are 
expanding (for example, the object approaches the camera) then the 
corresponding trajectory in the space of images £ is a crinkled arc by 
Proposition 2.4. 
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If the objects move in a more complicated way (in particular, their 
boundaries are deformed in a non-rigid manner), then the correspond- 
ing trajectory in the image space may not be a crinkled arc. However, 
Proposition 2.5 shows that typically such trajectories look like crinkled 
arcs "in a small scale": the angle between any two disjoint chords of 
the corresponding curve in C tends to f as these chords tend to the 
same point. 

The conclusion of Proposition 2.5 remains essentially valid also under 
more realistic assumptions on the color of the moving objects: indeed, 
near the object boundaries the image brightness in any case behaves as 
a scalar multiple of the characteristic function of the occupied domain. 

Moreover, the occlusions of the moving objects do not change this 
pattern. Indeed, only at the intersections of the boundaries of the 
occluded objects we can expect new phenomena, but typically these 
intersections have nearly zero area. So they do not affect the L 2 geom- 
etry of the trajectory. 

Thus we get a general (and, to our point, rather surprising) conclu- 
sion: 

Statement 5.1 A typical video- sequence is metrically similar to a 
"crinkled arc" in the Hilbert space C of images. In particular, its e- 
entropy and Kolmogorov n-width behave as those of the curve Ji. 

As in Sections 2-4 above, this fact provides an immediate limitation 
on the performance of linear approximation and acquisition methods. 
Let us assume that we want to represent all the images in a set C £ 
which is "large enough" : namely, it contains together with each image 
I also images representing a "motion of objects" in / (in particular, 
their zoom, translations, etc.). Hence the set Q in fact contains "video- 
sequences", and hence Statement 5.1 implies: 

Statement 5.2 No n-dimensional linear subspace W in the Hilbert 
space C of images can approximate all the images in Q at once better 
than to C4=. 

As for the problem of image acquisition from measurements, we get 
the following conclusion, analogous to that of Theorem 4.1: 

Statement 5.3 Let the image acquisition process comprise taking n 
measurements rrii(I), % = 1, . . . ,n (linear or non-linear) of the image 
I, together with a consequents processing P of the measurements. If the 
processing operator I = P(mi, . . . ,m n ) is a linear one then for some 
I G Q the error \ \I — III is at least C\-k=. 

II II 1 * /n 
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The inherent limitation of linear acquisition and representation meth- 
ods forced development of non-linear approaches. Most of them utilize 
the fact that wavelet representations of typical images in appropriate 
wavelet bases are "sparse" - with only few "large" coefficients. Efficient 
image capturing and representation approaches based on this fact are 
given, in particular in [91 091 HTJ l5lj . 

Another approach is based on "non-linear model-net approximation" 
( [U EDI E3] ) • In the present paper it was described in a "toy example" 
of piecewise-constant functions. As the images are concerned, the main 
problem is whether such a "model-based" representation is possible at 
all. Let us discuss shortly the "state of the art" here. 

5.2. Capturing of images and video-sequences by geometric 
models. As it was stressed in the introduction, the key to a successful 
application of the "algebraic reconstruction methods" presented in this 
paper to real problems in Signal Processing lies in a "model-based" 
representation of signals and especially of images. 

From the point of view pursued in this paper, most of the conven- 
tional image representation ("compression") methods can be consid- 
ered as "semi-linear" : their starting point is a linear representation of 
the image in a certain basis (Fourier, local Fourier, Wavelets ...). Then 
the coefficients of this linear representation are truncated, ordered and 
finally encoded in a highly non-linear way. 

There are "geometric" methods of image representation, based on an 
approximation by non-linear image models (usually constructed from 
the edges, ridges and other geometric visual patterns appearing in typ- 
ical images) - see [3H1 ESI HI I2D|- Some of these geometric methods 
have proved themselves to be very efficient in a representation and 
processing of special types of images (like geographic maps, cartoon 
animations, etc.). 

However, in general the "geometric" methods, as for today, suffer 
from an inability to achieve a full visual quality for high resolution 
photo-realistic images of the real worlds. In fact, the mere possibility 
of a faithful capturing such images with geometric models presents one 
of important open problems in Image Processing, sometimes called "the 
vectorization problem". 

Let us stress our strong belief that a full visual quality "geomet- 
ric" representation for high resolution photo-realistic images of the real 
worlds is possible. As achieved, it promises to bring a major advance 
in image compression and capturing, in particular, via the approach of 
the present paper. 



2!) 



5.3. Reconstruction of images from measurements. Let a func- 
tion f(x,y) of two variables be the brightness function of an image 
to be reconstructed from Computer Tomography measurements. The 
data of the Radon transform can be translated into the Fourier data, 
so we can assume that our measurements are just Fourier coefficients 
off. 

Now our general approach to this problem extends the non-linear 
inversion method presented in Section 4 above. It can be summarized 
as follows: 

1. We obtain the "first approximation" / of the function / to be 
reconstructed by one of available conventional methods. 

2. We approximate the function / (and hance also the function / to 
be reconstructed) by a "model one" Mf. The last comprises "simple 
geometric models" reflecting the structure of singularities of / and 
approximating / at its regular regions. 

Let us stress once more, as we did in our discussion above, that 
the mere existence of such a representation for real world images is an 
important open problem. 

3. We memorize the "combinatorial structure" of Mf (the number 
of its jumps in one-dimensional case; the topological structure of the 
edges, ridges, patches etc. for images). 

4. We consider specific geometric and brightness parameters of the 
models as unknowns, which we substitute into a system (***) obtained 
in the same way as the system (4.1) above. The right-hand side of this 
system is formed by the measured Fourier coefficients of /. 

5. We solve the appropriate subsystem of the system (***). In the 
solution process we start with the approximate solution obtained in 
step 2. The solution provides a set of the improved parameters of 
the model function Mf. Applying these parameters we finally get an 
improved approximation Mf of the original function /. 

The implementation of this program in real applications of Computer 
Tomography is now in its initial stages. In "toy problems" where we 
pretend to know a priori the model structure of the image, the approach 
works perfectly (not a big surprise! see [51]). We believe however that 
the time is ripe to study both the Image Processing and the Algebraic- 
Geometric parts of the problem. 

5.4. Geometric image compression and crinkles arcs. The fol- 
lowing remarks concern a possibility to use directly the universality of 
the "crinkled arc" in image compression. However, it involves encoding 
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of certain isometries of Hilbert spaces, and the feasibility and complex- 
ity of this task should be considered as an absolutely open problem. 

One of the most important tasks in the "geometric image compres- 
sion" is a compact representation of the systems of curves and points 
on the plane (see, in particular, [U [2], [3]). Mostly we can assume 
these curves and points to be mutually disjoint, and their specific 
parametrization is not essential. 

Let us consider a special case where the family of curves to be mem- 
orized is the family of the boundaries of a family of expanding domains 
in the plane. It turns to be difficult to utilize this special structure 
in the curve compression methods used in [U [2D]. in fact, according 
to these methods, each curve will be stored separately. On the other 
hand, by Propositions 2.4, 2.5 the characteristic functions of the inside 
domains of our curves form a crinkled arc in L 2 (Q 2 ) isomorphic to 7i. 
Consequently, we have an alternative approach to memorizing our fam- 
ily of curves: it is enough to memorize the transformations bringing it 
to 7i. This lead to two mathematical problems which we consider as 
important by themselves: 

Problem 1. What is the complexity of the "normalizing transforma- 
tion" in Theorem 2.2, and specifically, in Propositions 2.4, 2.5? (We 
can use, for example, the notion of complexity for infinite-dimensional 
objects, introduced in [601 El] ) - How many bits do we need to memorize 
them? 

Problem 2. How to use "geometric redundancy" of the expanding 
family - the fact that the curves do not intersect and "bound one an- 
other" - in their "conventional" compression? 
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